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Abstract 


Causal inference in completely randomized treatment-control studies with binary outcomes 
is discussed from Fisherian, Neymanian and Bayesian perspectives, using the potential outcomes 
framework. A randomization-based justihcation of Fisher’s exact test is provided. Arguing that the 
crucial assumption of constant causal effect is often unrealistic, and holds only for extreme cases, 
some new asymptotic and Bayesian inferential procedures are proposed. The proposed procedures 
exploit the intrinsic non-additivity of unit-level causal effects, can be applied to linear and non¬ 
linear estimands, and dominate the existing methods, as verified theoretically and also through 
simulation studies. 

Keywords: Causal inference; Constant causal effect; Potential outcome; Randomization-based 
inference; Sensitivity analysis 


1. INTRODUCTION 

The theory of causal inference from randomized treatment-control studies using the potential out¬ 
comes model has been well-developed over the past five decades and has been applied extensively 
to randomized experiments in the medical, behavioral and social sciences. The first formal notation 
for potential outcomes was introduced by Neyman (1923) in the development of randomization- 
based inference, and subsequently used by several researchers including Kempthorne (1952) and 
Cox (1958) for drawing causal inference from randomized experiments. The concept was formal¬ 
ized and extended by Rubin (1974, 1977, 1978) for other forms of causal inference from randomized 
experiments and observational studies, and exposition of this transition appears in Rubin (2010). 
The three broad approaches to causal inference under the potential outcomes model are Fisherian, 
Neymanian and Bayesian. 

The most common finite-population estimand in most causal inference problems is the average 
causal effect, defined as the finite-population average of unit-level causal effects. Since Neyman’s 
(1923) seminal work, additivity of unit-level treatment effects (or its lack thereof) and its influence 
on the inference for the average causal effect has been investigated thoroughly for continuous 
outcomes. In comparison, few researchers (e.g., Copas 1973) have studied this problem for binary 
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outcomes, in which the potential and observed outcomes can be summarized in the form of 2 x 2 
contingency tables. In this paper, we provide a characterization of additivity based on the 2x2 table 
of potential outcomes, and use it to (i) justify Fisher’s exact test from a randomization perspective, 
and (ii) propose an estimator of the variance of the average causal effect for binary outcomes that 
uniformly dominates the Neymanian variance estimator. As advocated by Rubin (1978), we also 
propose a Bayesian strategy for drawing inference about the average causal effect using the missing 
data perspective. Such a strategy is dependent on the assumptions related to model additivity, 
or more specifically, the nature and strength of the association between potential outcomes. We 
propose a novel sensitivity analysis which should help a practitioner understand how the analysis 
results might change if the assumptions are violated. 

Apart from the average causal effect, other popular estimands for binary outcomes are the log of 
the causal risk ratio and the log of the causal odds ratio. Although of great practical interest, to the 
best of our knowledge, estimators of these causal measures have not been studied carefully from the 
Neymanian perspective, because unlike the average causal effect, non-linearity of these estimands 
and their estimators make exact variance calculations intractable. We circumvent this problem 
by taking an asymptotic perspective. By deriving asymptotic expressions for variances of these 
estimators, we explore the adequacy of the widespread practice of drawing statistical inference for 
such causal estimands on the basis of independent Binomial models, and propose improved methods 
that are justified by randomization. We conduct simulation studies under different settings to 
demonstrate the effectiveness of the proposed methods and also illustrate their application to a 
recent randomized controlled trial. 

The paper is organized as follows. In the following section, we define the potential outcomes, 
the finite population estimands, the assignment mechanism and the observed outcomes. In Section 
3, we discuss the Fisherian and Neymanian forms of inference for 2x2 tables. In Section 4, 
we propose a Bayesian framework for causal inference, explore its frequentists’ properties and 
also propose a methodology for sensitivity analysis to assess the effect of violation of assumptions 
regarding additivity (or its lack thereof) on the inference. Causal inference for non-linear estimands 
is discussed in Section 5. A detailed simulation study is conducted in Section 6 to compare the 
different methods of inference and to demonstrate the superiority of the proposed methodology. 
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Application of the proposed methodology to randomized experiments with binary outcomes is 
demonstrated with a real-life example in Section 7. Some concluding remarks are presented in 
Section 8. Some technical details are in Appendix A, and the proofs, additional simulation studies 
and more details of the application are in the online Supplementary Materials. 

2. POTENTIAL OUTCOMES, ESTIMANDS, AND THE OBSERVED DATA 

The evolution of the potential outcomes framework was motivated by the need for a clear separation 
between the object of interest (often referred to as the “Science”) and what researchers do to learn 
about the Science (e.g., randomly assign treatments to units). We assume a finite population of 
N experimental units that are exposed to a binary treatment W and yield a binary response Y. 
Under the Stable Unit Treatment Value Assumption (Cox 1958; Rubin 1980), we define Yi{t) as 
the potential outcome for individual i when exposed to treatment t (t = 1 and t = 0 often refer to 
treatment and control, respectively). The N x 2 matrix of the potential outcomes {(yi(l), U(0)) : 
i = 1,, N} is typically referred to as the Science (Rubin 2005). Because the response Y is binary, 
the information contained in the Science can be condensed into a 2 x 2 table as shown in Table 


Table 1: “Science Table” of the Potential Outcomes 



y(o) = 1 

o 

II 

row sum 

y(i) = i 

All 

A^io 

Ai+ 

y(i) = o 

Aoi 

A^oo 

A^o+ 

column sum 

A+i 

N+o 

N 


2.1 Finite-population causal estimands and uniformity of unit-level causal effects 

Having defined the so-called Science, we now proceed to the definition of causal estimands. A 
unit-level causal effect is defined as a contrast between the potential outcomes under the treatment 
and the control, for example, r* = U(l) — U(0). We define the finite population average causal 
effect as 


r = 


1 

N 



= Pi -Po, 


i=l 

where pt = Yi{t)/N is the finite population average of Y{t) for t = 0,1. For binary outcomes, 
the average causal effect is also called the causal risk difference (CRD). From Table it follows 
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that 


r = Ni+/N - N+i/N = {Nio - iVoi) /N. 

A measure of uniformity (or its lack, thereof) of the unit-level causal effects is the finite population 
variance of the individual causal effect r*, given by 

1 ^ 1 

E + A'oi)(iVii + A^oo) + 4iVioiVoi} . (1) 

Note that can also be represented as 

= Sf + Si - 2Sw, ( 2 ) 

where ~ ~ 1) is the finite population variance of the potential outcome 

Yi{t), and Sw = - Pl}{Y^iO) - Po}/{N - 1) = (A^nA^oo - ArioA^oi)/{A^(A^ - 1)} is the 

finite population covariance between li(l) and li(0). 

Note that constant causal effect or additivity of unit-level causal effects implies that Sf = 
S'qjS'io = SiSq, and the uniformity measure = 0. Copas (1973) considered a representation of 
the potential outcomes similar to that in Table and defined parameters a = r as the treatment 
effect and /3 = (A^io + Aoi)/A^ as a measure of “the differential effect.” However, we feel that /3, 
which essentially equals {Yld=i'^i) / ^ i is not an adequate representation of the differential effect, 
because it does not reduce to zero when all unit-level causal effects are equal to 1 or —1. To discuss 
this aspect further, we consider the case of strict additivity of treatment effects, where Ti = t for 
all i = 1,..., iV, and summarize its impact on the Science and its summary measures r, and /3 
in Table m 


Table 2: Effect of additiv^ on the Science 


t{= n) 

Entries of Table 1 

T = a 


/3 

1 

A^ii = A^oi = A"oo = 0, N'lo = N 

1 

0 

1 

-1 

A^ii = N'lo = A^oo = 0, Nqi = N 

-1 

0 

1 

0 

Nio = Noi = 0, fVoo + N'li = A^ 

0 

0 

0 


Note that the last row of Table represents a special case of additivity, with zero treatment 
effect for each unit. Such a hypothesis about the Science case is referred to as Fisher’s sharp null 
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hypothesis of no treatment effect, and forms the basis of the “Fisherian” inference described in 
Section EH 

To sum up our discussion on the degree of uniformity of treatment effects, we define another 
condition referred to as monotonicity (Angrist et al. 1996). 

Definition 1. The Science table is said to satisfy the monotonicity condition if either of the 
following two conditions hold: (i) Tj(l) > Ti(0) for all i (or equivalently Aqi = 0), (ii) li(l) < Tj(0) 
for all i (or equivalently A^io = 0). 

Under monotonicity, we have t = a = (3 = Niq/N, = NioNii/{N{N — 1)} if (i) holds, and 
T = a = —Nqi/N, (3 = Nqi/N, S'^ = NoiNii/{N{N — 1)} if (ii) holds. We also note that any one 
of the three additivity conditions as described in Table implies at least one of the monotonicity 
assumptions. We shall discuss the impact of strict additivity and monotonicity on the inference for 
r in Section EH 

2.2 Treatment assignment and the observed data 

We consider a completely randomized treatment assignment in which Ni and Nq units receive 
treatments 1 and 0 respectively. Let W = (VUi,..., VUv) be the vector of treatment assignments 
and let w = {wi,... ,wi\f) be a realization of W. Then, a completely randomized experiment 
satisfies P{W = w) = NiINq\/N\ if — -^1 P{W = w) otherwise. The observed 

outcomes are deterministic functions of both the treatment and the potential outcomes, since 
y.obs ^ + (1 - Wi)Yi{0). Let = (Y]°^", • • • ,Y§^^) be the vector of the observed 

outcomes. Since the treatment and the outcome are both binary, the observed data form a 2 x 2 
contingency table as shown in Table The row sums in Table {Ni,Nq), are the numbers of 
individuals receiving treatment and control, and the column sums, (n+i,n+o), are the number of 
individuals with outcomes 1 and 0, respectively. 

We conclude this section by emphasizing that the fundamental problem of causal inference is 
the missingness of one element of each pair (1^(1), 1^(0)). Consequently, the key idea is to infer 
about the entries of Table (and the estimands that are functions of these unknown entries) using 
those of Table El and the distribution of these entries under randomization. 
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Table 3; Summary of the Observed Data 



y^obs _ y 

y^obs _ Q 

row sum 

W = 1 

nil 

nio 

Ai 

o 

II 

n-oi 

noo 

No 

column sum 

n+i 

n+o 

N 


3. FISHERIAN AND NEYMANIAN APPROACHES TO INFERENCE 

In this section, the potential outcomes of the finite population are assumed to be fixed numbers, 
and the randomness in the observed outcomes comes only from randomization of the treatment 
assignment (Neyman 1923; Rubin 1990). We discuss two forms of finite-population inference — 
Fisherian and Neymanian — under this set-up. Fisher’s form of randomization-based inference 
focuses on assessing the sharp null hypothesis of no treatment effect using the randomization 
distribution of a test statistic, which is obtained by imputing the missing outcomes under the 
sharp null. Neyman’s form of randomization-based inference can be viewed as drawing inferences by 
evaluating the expectations of statistics over the distribution induced by the assignment mechanism 
in order to calculate a confidence interval for the typical causal effect. Using asymptotic results is 
one way of achieving this. In the following subsection (Section 3.1), we briefly discuss the Fisher 
randomization test and establish its connection to Fisher’s exact test. In Section 3.2, we discuss 
Neymanian inference and propose an improvement over the traditional Neymanian estimator. 

3.1 Fisherian randomization test and its connection to Fisher’s exact test 

According to Fisher (1935a), randomization yields “a reasoned basis for inference,” and it allows for 
testing the sharp null hypothesis of zero individual causal effect, i.e., !)(!) = 1^(0) for z = 1,... ,N, 
characterized by the last row of Table[^ Such a null hypothesis permits imputation of all the missing 
potential outcomes. Although in principle any test statistic can be used, the most natural one is 
T = pi- po, where pi = WiY°^^/Ni = nn/Ai, and po = “ Wi)Y°^^/No = uoi/Nq. 

The test statistic r is a function of both the treatment assignment and the observed outcomes. 
Under the sharp null hypothesis, the randomness of r comes only from the randomization of the 
treatment assignment W. The p-value under the sharp null is a measure of the extremeness of 
the observed value of the test statistic with respect to its randomization distribution under the 
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sharp null. For a two-sided test, the p-value is typically defined as the proportion of values of |t| 
generated under all possible randomizations that exceed its observed value In general, the 

null distribution of r and the p-value can either be calculated exactly, or approximated by Monte 
Carlo. 

However, we can obtain the “exact” distribution of the randomization test statistic for a binary 
outcome. In Table the margins A^i and Nq are fixed by design. Under the sharp null hypothesis, 
the margins n+i and n+o represent the number of observations with potential outcomes 1^(1) = 
li(0) = 1 and Yi{V) = Yi{0) = 0, respectively, and are equal to Nn and Nqq in TableIt follows 
that 

^ nil noi mi n+i - mi N A^n 

r =-=-= -nn-, (d 

Ni No Ni No NiNo No 

i.e., the test statistic r is a monotone function of nn. Therefore, the rejection region based on 
T is equivalent to the rejection region based on nn, which has the usual Hypergeometric null 
distribution of the exact test for a two by two contingency table. 

More interestingly, any randomization test using statistics other than r is also equivalent to the 
test based on nn, since any test statistic is a function of nn under Fisher’s sharp null hypothesis. 
Numerically, the test has exactly the same form as Fisher’s exact test, although the two tests were 
originally derived based on completely different statistical reasonings. In observational studies 
under Multinomial or independent Binomial sampling, Fisher (1935b) justified his exact test for 
association as a conditional test, by arguing that the marginal counts are nearly ancillary. However, 
it turns out that the marginal counts contain some information about the association (Chernoff 
2004), and they are not ancillary. Here, we give a justification of the validity for Fisher’s exact test 
based on randomization, if the data truly come from a completely randomized experiment. For 
more discussion about the hypothesis testing issue, see Berkson (1978), Yates (1984) and Chernoff 
(2004) for observational studies, and Ding (2014) for randomized experiments. 


3.2 Neymanian inference for the average cansal effect 

Neyman (1923) showed that t = pi — po is unbiased for r, with the sampling variance 


var(r) 


No 

NiN 


Sf + 


Ni 

NqN 


si + 4si„ = |i + #- 


N 


Ni No 


Si 

IV’ 


(4) 






where and Sq are defined in Section 2.1 The proof can be found in Neyman (1923) or 


directly from Lemma A.2 in Appendix B. Since the third term in Q, S^/N, depends on the joint 
distribution of the potential outcomes, it is not identifiable from the observed data without further 
assumptions. Because of this difficulty, Neyman (1923) proposed a “conservative” estimator for 
var(r), defined as 


Vns 


*1 




(5) 


neyman , 

where sf = - 1) and -Pof/{No - 1) are the sample 

variances of the observed outcomes under the treatment and the control, respectively. For binary 
outcomes, the variance estimator can be simplified as 


Vat, 


1 


eyman 


Ni{N, - 1 ) 


) + 


1 


NfJ iVo(iVo-l) 


Tin 


nor-^0^ 


Pl(l -Pl) Po(l -Po) 


( 6 ) 


N\ — 1 Nq — 1 

As we will discuss later in Section 5.2 (§ is very close to the standard formula for the variance 
of the difference of sample proportions, except for the fact that the coefficient denominator in the 
latter are instead of — 1 for u; = 0,1. 

The variance estimator VNeyman is “conservative” in the sense that it only unbiasedly estimates 
the first two terms of Q, S^/Ni + Sq/Nq, and therefore E{VNeyman) > var(r), a fact pointed out 
by several authors, e.g., Gadbury (2001), who provided an expression for the bias of the estimator. 
The variance estimator VNeyman is unbiased for the true variance if and only if the individual 
causal effects are constant (r^ = r) or, equivalently, the conditions in Tableare satisfied. Neyman 
(1923)’s constant causal effect assumption is equivalent to using 0 as a lower bound for 5^, which 
is not sharp for binary outcomes. Consequently, Neyman’s “conservative” variance estimator can 
be improved for binary outcomes, even if the potential outcomes are not strictly additive. The 
following result gives the sharp lower bound for S^/N in terms of r. 

Theorem 1. A lower bound for S^/N is 

52 


> 


|r|(l - |t| 


(7) 


N - N-I ’ 

and equality holds if and only if the potential outcomes satisfy the monotonicity condition as stated 
in Definition [H 
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Theorem [T] implies that the monotonicity assumptions are the most “conservative” cases for 
variance estimation. As shown in the proof of Theoremj^ the lower bound Q for is obtained via 
an optimization approach, which minimizes 5^ under the constraints of the marginal distributions 
Pi and pq. Therefore, the lower bound in Q is “sharp”, in the sense that it cannot be uniformly 
improved without further assumptions. 

The lower bound for S'^/N allows us to define the following estimator, which is an improvement 

over Neyman’s variance estimator given by (©: 

- ^ pi(l - Pi) Po(l -Po) _ |r|(l - |r|) 

VNeyman Ni - 1 ^ No - I iV - 1 ' 

This estimator cannot be larger than Visfeyman, and is also an improvement of the variance estimator 
given in Robins (1988). If r G {1, —1,0}, i.e., if = 0, we have |r|(l — |r|) = 0, and with large 
sample size, the adjusting term |t|(1 — \t\)/{N — 1) is of higher order relative to the two leading 
terms of the variance estimator ([^. Therefore, if = 0, then asymptotically, the adjusting term 
does not hurt, and V)veyman yNeyman are equivalent. However, for small samples, we may under¬ 
estimate the true sampling variance due to the positive adjusting term |r|(l — |t|)/(A^ — 1)- We 
will investigate this finite sample issue further in the simulation studies. If the true average causal 
effect is not —1,0, or 1, i.e., 7 ^ 0 the correction term |r|(l — |t|)/(A^ —1) in the variance estimator 

cannot be asymptotically neglected, and the “adjusted” variance estimator will improve Neyman 
(1923)’s original variance estimator. For example, if we observed a two by two table with cell 
counts n'j’Q®, Uq}*®, Uqq®) = (15,5,5,15), then we have pi = 0.75,po = 0.25, V/veyman = 0 . 020 , 
and V^eyman ~ 0.013, with the latter variance estimator 32.48% smaller than the former one. 

4. BAYESIAN CAUSAL INFERENCE FOR BINARY OUTCOMES 

In this section, we adopt the Bayesian causal inference framework advocated by Rubin (1978). We 
assume that all the potential outcomes are drawn from a hypothetical super-population, while we 
are still interested in making inference on the finite population average causal effect r. Similar to 
Neymanian randomization inference, the association between the potential outcomes is also crucial 
for our Bayesian causal inference. We first propose a Bayesian procedure based on a simple model 
with independent potential outcomes, and discuss its frequentists’ repeated sampling property (Ru¬ 
bin 1984). We then propose a sensitivity analysis procedure to investigate the impact of departures 
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from the independence assumption on Bayesian inference. 

4.1 Independent potential outcomes 

Assume that we have the following model with independent potential outcomes; 


Yi{l) ~ Bern(7ri+), ^^(O) ~ Bern(7r+i), li(l)Ayi(0), i = 1, • • • , 


where “IL” denotes independence. The notation (tti+jTt+i) is chosen to be coherent with the 
marginal probabilities in Table to be discussed later. We will relax the independence assumption 


in Section 4.3 We postulate the following priors 7ri+ ~ Beta(ai,/3i), vr+i ~ Beta(ao)/^o); and 
assume that they are independent a priori. 

Since the treatment assignment mechanism is ignorable (Rubin 1978) in completely randomized 
experiments, the joint posterior distribution of 7ri_|_ and vr+i is 


/(7ri+,7r+i I iy,y°"'^)oc7r“|-'(l-7ri+)^i-V“ri(l-7r+i)/'o-i<|i(l-7ri+)-^°<«(l-7r+i)-o°, (9) 


or equivalently, vri+lVR, ~ Beta(nii + ai, nio + /3i), 7r_|_i|VR, ~ Beta(noi + ao) ^oo + /3o), 

and they are independent a posteriori. After obtaining the posterior distribution of (vri+, tt+i), we 
can impute all the missing potential outcomes, conditioning on (7ri+, tt+i). If Wi = 1, we impute 
li(0)|iy, TT+i ~ Bern(7r+i); and if Wi = 0, we impute yj(l)| VT, 7ri+ ~ Bern(7ri+). 

Therefore, the posterior distribution of r conditioning on 7ri_|_ and vr+i is 




TT+l 


nil + Bo- noi - Bi 

N 


( 10 ) 


where Bi ~ Binomial(A^i, vr+i), Bq ~ Binomial(A^O) "^ 1 +)) and they are independent. The descrip¬ 
tion above also illustrates a Monte Carlo strategy for simulating the posterior distribution of r. 
For theoretical comparison with Neymanian inference, we can also obtain the posterior mean and 
variance of r as follows. We give the exact formulae for posterior mean and variance in Appendix 
C online, and here for simplicity we give approximate formulae. 


Theorem 2. Assume that the prior pseudo counts (oq,/ 3o, oi,/3i) are small compared to n^’s. 
The posterior mean of r is 

E{t I VT, y°’’") Ri r, 
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and the posterior variance of r is 


var(r | W, 


No pi(l -pi) Ni po{l -po) 
N A^i-T N No-1 


( 11 ) 


From Theorem we can see that the posterior variance of r is smaller than Neyman’s vari¬ 
ance estimator. These variances are different because Neyman (1923) assumed perfect correlation 
between the potential outcomes, while the Bayesian model assumes independence between the po¬ 
tential outcomes. As shown in Q, the assumption that = 0 is the worst case for the variance of 
var(r), and Neyman (1923) adopted this as the most “conservative” estimator for the true variance. 


4.2 Frequency evaluation of the Bayesian procedure under independence 

Going back to the finite population perspective, the sampling distribution of r depends on the finite 
population covariance between !)(!) and li(0), as shown in (Q. Assuming independence between 
Yi{l) and 1^(0), we have Sio = 0, and Q becomes 


var(r) 


Nq <^2 I -^1 q 2 

NiN^^ ^ NqN^^' 


The variance of r can be unbiasedly estimated by 


{} No 2 , Ni 2 Nopi{l-pi) Nipo{l-po) 
Vind = ^j Si + Sq = — ;- 1 - 


NiN 


NoN 


N Ni-I N No-I 


( 12 ) 


The estimator of the sampling variance of r in (12) and the approximated posterior variance of r 
in 0 under independence are the same. 

Therefore, the Bayesian credible interval under independence will have a correct asymptotic 
coverage property, if the finite population covariance of the potential outcomes is zero. However, if 
the hnite population covariance between Yi{l) and Ti(0), ^lo, is negative, we have 


var(r) < 


1^0 C .2 I -^1 c -2 

NiN^^ ^ NoN^^ 


according to Q, which implies that the Bayesian credible interval will over-cover the truth over 
repeated sampling. If the finite population covariance between Ti(l) and Ti(0), 5io, is positive, the 
Bayesian credible interval may not have a correct frequentists’ coverage property. 
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4.3 Bayesian sensitivity analysis 

The independence between potential outcomes may not be plausible even conditionally on observed 
covariates. In particular, if the potential outcomes are positively correlated, the Bayesian credible 
interval may not have a correct frequentists’ coverage property. However, the observed data provide 
no information about the association between the two potential outcomes, since they are never 
jointly observed. Therefore, we propose a sensitivity analysis approach for the Bayesian model 
discussed above. 


Table 4: Model of the Potential Outcomes 



y(o) = 1 

Y(0) = 0 

row sum 

y(i) = i 

TTii 

TTio 

^ 1 + 

0 

II 

1—1 

TTOI 

TTOO 

1 - TTl + 

column sum 

T^+l 

1 - TT+l 

1 


The joint distribution of (1^(1), 1^(0)) follows a Multinomial distribution with parameters 
(vTii, TTio, vToi, vToo) as showu in Table which can be equivalently characterized by the marginal 
distributions ( 7 ri_|_, vr+i) and an association parameter. We propose a new characterization of asso¬ 
ciation between the potential outcomes in terms of the sensitivity parameter: 

p{y(i) = 1 1 y(o) = 1} TTni-TT+i 


7 = 


G ( 0 , oo) 


P{Y{1) = 1 I y(0) = 0} TTio TT+I 
When the potential outcomes are independent, we have 7 = 1; when tth —?■ 0, we have 7 — 7 - 0 ; 
when TTio —>■ 0, we have 7 —)• 00 . In practice, we propose varying our sensitivity parameter 7 over a 
wide range of values, and performing Bayesian inference at each fixed value of 7 . 

There is a one-to-one mapping between (tth, ttiq, ttoi, vroo) and ( 7 ri_|_, vr+i, 7 ), and thus the cell 
probabilities TTjk’s can be expressed as 


TTii = 
TTOl = TT+i - 


77ri+7r+i 


1 - TT+i -h 77 r+i ’ 

jm+TT+i 

1 - TT+l -h 77 r+l ’ 


TTio = 


7ri+(l - TT+l) 


1 - TT+l -F 77r+i ’ 

TTOO = 1 - TT+l - 7ri+ -I- TTii. 


(13) 

(14) 


Since all the cell probabilities are within the interval [0,1], the equations in (13) and (14) impose 
the following restrictions on ( 7 ri_|_, vr+i, 7 ): 


7 ( 7 ri+ - TT+l) < 1 - vr+i, 77 r+i > 7 ri+ vr+i - 1 . 


(15) 
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The posterior distributions of (7ri+,7r+i) are the same as (§. However, the imputations of the 


missing potential outcomes are different from Section 4.1 For Wi = 1, we impute 


F,(0)|yi(l) = 1 ~ Bern(^= 


y(0)|y(l)=0 ~ Bern 
For VFj = 0, we impute 


7ri+ 1 - vr+i+77r+iy ’ 

TToi TT+i 77ri + 7r+i 


y(l)|y(0)=0 ~ Bern 


71-1+ 

1 - 7ri+ 

Bern | 

^VTii _ 

v^+i 1 

Bern | 

^ TTlo 

V 1 - TT+i 


77ri+ 


^1+ 


Observed data 


fV 


F” 


Potential outcomes 


FI 


F(0) 


1 


1 

/7ll 

— 

1 

«ii 


? 



1 


1 



? 










0 

^10 

0 

^10 


? 

1 


0 


0 



? 









0 


1 

«01 

? 

-^01 " 

. / Jtjj \ 

-Bm|«oi.jt77) 

1 


o 

% 

1 


? 


1 










0 

^00 

? 


. 1 ^ 10 \ 

-Bm«oo.i ^ 

1 ^ ^+1 / 

0 

0 


0 


? 



0 


Ls,, ~ Bin « 


11' Jt, 


S,o~Bin n,o. 


1-Jt, 


Figure 1: Imputation of the Missing Potential Outcomes 


We illustrate the strategy for imputing missing potential outcomes in Figure in which we 
have 

A/-obs ^11 + -^01 + Hoo — Bn — Bio — noi 

t\W,Y , 7ri+, TT+I ~ —-— — - — - - ■ —- -- - — 


where Bn ~ Binomial(nii, 7rii/7ri+), Bio ~ Binomial {nio, vroi/(l — 7ri+)},i?oi ~ Binomial(noi, tth/tt+i), 
Boo ~ Binomial {noo; 7rio/(l — vr+i)}, and {Hu, Hio, Hqi, Hqo} are independent. Note that although 
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the posterior distribution of ( 7 ri_|_, tt+i) does not depend on the association parameter 7 , the pos¬ 
terior distribution of r does. While there is no explicit form of the posterior distribution of r, we 
can approximate it via Monte Carlo. We will apply the proposed sensitivity analysis in Section 

5. GENERAL CAUSAL MEASURES 

Up to now, we have considered the most commonly used causal estimand, the average causal effect 
(or CRD). However, researchers and practitioners are also often interested in the log of the causal 
risk ratio (relative risk) 

'All + Aio' 


log(CRR) = log(pi) - log(po) = log 


All -I- Aoi 


(16) 


and the log of the causal odds ratio 


log(COR) = logit(p.) - logit(p„) = log - log , (17) 

where logit(x) = log(x) — log(l — x). One attractive feature of CRD is that it is linear in the 
individual causal effects. On the contrary, log(CRR) and log(COR) are hnite population level 
causal estimands, which are not simple averages of individual causal effects. The linearity of the 
average causal effect permitted Neyman (1923) to obtain an unbiased estimator with exact variance. 
However, the elegant mathematics of Neyman (1923)’s randomization inference for CRD is not 
directly applicable to nonlinear causal measures. We will fill in the gap by obtaining asymptotic 
randomization inference for log(CRR) and log(COR). 

We can obtain estimators for the log of the causal risk ratio and odds ratio by substituting 


estimators of pi and po in (16) and (17), i.e., log(CRR) = log(pi) — log(po) and log(COR) = 
logit(pi) — logit()3o)- As mentioned earlier, general nonlinear causal measures have not been stud¬ 
ied carefully from the Neymanian perspective, because the absence of linearity makes exact vari¬ 
ance calculations intractable for such measures. Instead, we take an asymptotic perspective in 
this section. In the following subsections, we will (i) propose asymptotic randomization inference 
for log(CRR) and log(COR); (ii) compare them with the results under traditional independent 
Binomial models; (iii) discuss Bayesian inference for the general causal measures. 
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5.1 Neymanian asymptotic randomization inference 

Unfortunately, the unbiasedness is not preserved by plugging p\ and po into the nonlinear functions 
(16) and ( |17[ ). Furthermore, the plug-in estimators do not have finite means or variances, since pi 
and Po can equal to 0 or 1 with positive probabilities. In spite of these limitations, when pi and 
Po are both bounded away from 0 and 1, the estimators log(CRR) and log(COR) have regular 
asymptotic distributions, summarized in the following two theorems. 

Theorem 3. If 0 < po,pi < 1, as —)■ oo, log(CRR) is consistent for log(CRR) and asymptoti¬ 

cally Normal with asymptotic variance 


Nipi + NqPo I' Sf ^ 


52 


(18) 


Npipo \Nipi Nopo Nipi + Nopo, 

Assuming = 0 as in Neyman (1923), we can estimate the asymptotic variance by 

^ _ nio (nil + raoi)jVo upp (nn -F noi)iVi 

“ nuNi noiN + noilVo nuN ' ^ ’ 

Theorem 4. If 0 < po,Pi < 1, as — >■ 00 , log(COR) is consistent for log(COR) and asymptoti¬ 

cally Normal with asymptotic variance 

Nipi{l - pi) + NoPo{l - Po) ( Sf Si 


+ 


Npi{l - pi)po{l - Po) (Aiipi(l-pi) Nopoii-po) Nipi{l - pi) + Nopoil - Po) 
Assuming = 0 as in Neyman (1923), we can estimate the asymptotic variance by 

A 1111 

^COR —-1-1-1-• 

nil nio npi npo 


.( 20 ) 


( 21 ) 


The variance formulae (18) and (20) for log(CRR) and log(COR) are similar to the variance 


formula (|^ for r, depending on the finite population variances of the potential outcomes Sf and 
Sq, and the unidentifiable finite population variance of the individual causal effect Sf. 

Furthermore, borrowing the idea of bias-correction for ratio estimators (Cochran 1977), we can 
obtain bias-corrected estimators for log(CRR) and log(COR), which have lower order asymptotic 
biases than the naive moment estimators. Similar to Neyman’s variance estimator for var(r), the 


variance estimators in (19) and (21) are conservative unless the constant causal effects assumption 
holds. Analogous to the result in ([^ for CRD, using the lower bound for Sf in Theorem 


we 


can improve the variance estimators (19) and (21) for the bias corrected estimators for log(CRR) 


16 



















and log(COR). These bias-corrected point estimators and improved variance estimators improve 
the moment-based Neymanian inference asymptotically, and we call them improved Neymanian 
inference hereinafter. We provide technical details about bias and variance reduction in Appendix 
A with proofs in the Supplementary Materials. 

5.2 Independent Binomial models versns Neymanian inference 

In current clinical practice, the following independent Binomial models are widely used: 


nil ~ Binomial(A"i,pi), noi ~ Binomial(A^O)Po)) nulLnoi. 


( 22 ) 


In the model above, nn and noi are assumed to be Binomial random variables. Such an assumption 
cannot, however, be justified by randomization using the potential outcomes model. 

The maximum likelihood estimators for pi — pQ,log{pi) — log(po); logit(pi) — logit(po) are the 
same as r, log(CRR), log(COR), and their asymptotic variances (Woolf 1955; Rothman et al. 2008) 
can be estimated by 


f>Bin 

^CRD 


T/Bin _ - - j_^ - _ 

rnuTi —-TW T-TW — 


Pi(l -Pi) Po(l -Po) 


Ni 


No 


1 


1 


1 


1 


^CRR 


nio noo 


nil A^i noi No nuNi noiNo' 


r>Rin 1111 

l^COR —-1-1-1-• 


(23) 

(24) 

(25) 


nil nio noi noo 

Here, the superscript “Bin” is for “Binomial” models. For CRD and log(COR), the estimated 
variances under independent Binomial models are the same as Neymanian inference assuming con¬ 
stant causal effects. However, this does not hold for log(CRR). One sufficient condition for the 
equivalence of the variances from Neymanian inference and independent Binomial models is 


^ , or equivalently, pi = po, 

No noi 

which essentially assumes the null hypothesis of zero average causal effect. 

However, all the conclusions here are based on the constant causal effects assumption which may 
not be realistic in applications with binary outcomes. Without assuming constant causal effects and 
by using the new sharp bound for 5^ in Q, we obtain different results from independent Binomial 
models, as shown in Appendix A. One surprising property of the log odds ratio is that the variance 
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estimator under independent Binomial models (25) is symmetric with respect to treatment and 


outcome, which coincides with the randomization-based variance estimator (21) assuming = 0. 


However, the true variance of log(COR) over all possible randomizations, (20), and the improved 


variance estimator in Appendix A, (A.4), do not have this symmetry. 


5.3 Bayesian inference for general causal measures 

As shown above, Neymanian randomization inference for nonlinear measures of causal effects in¬ 
volves tedious algebra, and relies on asymptotics under regularity conditions. In contrast, the 
Bayesian inference for log(CRR) and log(COR) is quite natural, once we impute all the missing 
potential outcomes based on their posterior predictive distributions. 

For example, under the independent potential outcomes model, we have 

log(CRR)|B^,r°'^^7^l+,7^+l ~ 

Vnoi + BiJ 

log(COR)|VF,l"°'"^7^l+,7^+l ~ log ( ^ - log ( ) • 

\N-nii-BoJ \N-noi-BiJ 


Also, we can apply the Bayesian sensitivity analysis technique, similar to Section 4.3, and obtain 


log(CRR)|l^,r°^^7^l+,7^+l 

log(COR)|W,r°^^7^l+,7^+l 






log 

log 


/ nil + Rqi + Boq \ 

Vnoi + Rii + Bio) ’ 
f nil + Bqi -|- Bqo \ 
\N — nil — Roi — Bqo) 


( npi -I- Rii — Rio \ 

\N — not — Bii — Bio) 


The posterior distributions of these causal measures can then be approximated by Monte Carlo. 


6. SIMULATION STUDIES 

In order to compare the finite sample properties of Neyman’s original method, the modified Ney- 
man’s method, and the Bayesian method assuming independent potential outcomes, we conduct 
two sets of simulation studies with independent and positively associated potential outcomes. The 
first set listed as Cases 1-5 in Table represent independent potential outcomes, while those listed 
as Cases 6-12 represent positively associated potential outcomes. Table also shows the marginal 
variances, correlations of the potential outcomes, and causal measures for each set of potential 
outcomes. To save space in the main text, we present only the results for CRD and log(COR). 
The results for log(CRR) and the simulation studies for negatively associated potential outcomes 
are discussed in the online Supplementary Materials. 
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Table 5: “Science table” for the simulation studies 


Case 

Nil 

Nio 

Noi 

Noo 


n2 

Sio 


r 

log(CRR) 

log(COR) 

1 

50 

50 

50 

50 

0.251 

0.251 

0.000 

0.503 

0.000 

0.000 

0.000 

2 

30 

70 

30 

70 

0.251 

0.211 

0.000 

0.462 

0.200 

0.511 

0.847 

3 

30 

90 

20 

60 

0.241 

0.188 

0.000 

0.430 

0.350 

0.875 

1.504 

4 

80 

20 

80 

20 

0.251 

0.161 

0.000 

0.412 

-0.300 

-0.470 

-1.386 

5 

60 

20 

90 

30 

0.241 

0.188 

0.000 

0.430 

-0.350 

-0.629 

-1.504 

6 

60 

40 

40 

60 

0.251 

0.251 

0.050 

0.402 

0.000 

0.000 

0.000 

7 

50 

50 

30 

70 

0.251 

0.241 

0.050 

0.392 

0.100 

0.223 

0.405 

8 

50 

70 

30 

50 

0.241 

0.241 

0.010 

0.462 

0.200 

0.405 

0.811 

9 

40 

110 

10 

40 

0.188 

0.188 

0.013 

0.352 

0.500 

1.099 

2.197 

10 

70 

30 

50 

50 

0.251 

0.241 

0.050 

0.392 

-0.100 

-0.182 

-0.405 

11 

50 

30 

70 

50 

0.241 

0.241 

0.010 

0.462 

-0.200 

-0.405 

-0.811 

12 

30 

10 

no 

50 

0.161 

0.211 

0.010 

0.352 

-0.500 

-1.253 

-2.234 


For given potential outcomes, we draw, repeatedly and independently, the treatment assignment 
vectors 5000 times, obtain the observed outcomes, and then apply three methods; Neymanian infer¬ 
ence assuming constant treatment effects, improved Neymanian inference, and Bayesian inference 
assuming independent potential outcomes. The improved Neymanian inference means using the 
improved variance estimator Q for CRD, and bias-corrected estimators (|A.l ) and (A.3) and im¬ 


proved variance estimators (A.2) and (A.4) for nonlinear causal measures log(CRR) and log(COR). 
Comparison of these methods are summarized in Figure with average biases, average lengths of 
the 95% confidence/credible intervals, and the coverage probabilities. 

First, the bias-corrected estimators for nonlinear causal measure log(COR) do have smaller 
biases than the original Neymanian estimators and Bayes estimators in most cases. Second, the 
confidence intervals from the modified Neyman’s method are narrower than Neyman’s original 
method, while still maintaining correct coverage properties. They indeed improve Neyman’s original 
method. Third, the average widths of the Bayesian credible intervals are much narrower than the 
original and modified Neyman’s method. Moreover, when the potential outcomes are independent. 
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the Bayesian credible intervals have correct frequentists’ coverage property. When the potential 
outcomes are positively associated and the average causal effect is small, the Bayesian credible 
intervals slightly under-cover the truth. The results in the Supplementary Materials show that when 
the potential outcomes are negatively associated, even the narrowest Bayesian credible intervals 
over-cover the true causal measures, and the Neymanian intervals and their modification are too 
“conservative.” 

As suggested by a reviewer, it is also interesting to investigate the frequency coverage property of 
the improved variance estimator Q for CRD under the sharp null. Table [^compares the frequency 
properties of Neymanian variance estimator Q and its improved version Q, with moderate sample 
size = 30 and different choices of Nn and Nqq such that A^i = Nq = 15. Except for the case with 
A^ii = Nqq = 15, the improved variance estimators have shorter confidence intervals but preserve 
the same coverage rates. Under the sharp null, with sample size N = 30, the sampling variance 
of T is maximized at A^n = Nqq = 15, and in this case the adjusting term |r|(l — |r|)/(A^ — 1) 
has the wildest behavior. Therefore, in practice, if we have small sample sizes and observe that 
Pi ~ Po ~ 0.5, the improved variance estimator may hurt our inference. In all other situations, we 
suggest using the improved variance estimator. 

Table 6: Neymanian and its improved variance estimators for CRD, ([^ and Q, under the sharp 
null hypothesis with N = 30 and A'lo = A'oi = 0 


iVii 

A'oo 

Length Coverage 

(Using Neyman’s estimator) 

Length'^ Coverage‘s 

(Using corrected estimator) 

20 

10 

0.686 

0.951 

0.644 

0.951 

25 

5 

0.542 

0.959 

0.491 

0.959 

15 

15 

0.728 

0.971 

0.683 

0.858 

12 

18 

0.713 

0.942 

0.672 

0.942 

8 

22 

0.633 

0.96 

0.601 

0.96 
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7. APPLICATION TO A RANDOMIZED CONTROLLED TRIAL 


This example is taken from Bissler et al. (2013), where the authors compare the rate of adverse 
events in the treatment group versus the control group. The adverse event naspharyngitis occurred 
in 19 out of 79 subjects in the treatment group with everolimus, and it occurred in 12 out of 39 
subjects in the control group. Therefore, the 2x2 table representing the observed data has cell 
counts (nil, nio, ^oi) ^oo) = (19,60,12, 27). In Figure [3(a)| we show the results for three causal mea¬ 
sures using Neymanian inference, modified Neymanian inference, and Bayesian posterior inference 
assuming independent potential outcomes. The results match with those in our simulation studies 
in the sense that the bias-corrected estimators are slightly different from the original estimators, 
and the Bayes posterior credible intervals are much narrower than the confidence intervals from 
Neymanian inference. However, in this particular example, all intervals cover zero. 

Since the independence assumption between potential outcomes has a strong impact on the 
Bayesian inference for the finite population causal measures, we conduct a sensitivity analysis as 


proposed in Section 4.3 by varying log( 7 ) within [—2,4], and obtain Bayesian credible intervals 
for the causal measures at each log( 7 ). Figure [3(b) shows the sensitivity analysis for CRD and 
log(COR), with similar patterns log(CRR) as shown in the Supplementary Materials. Finally, the 
widths of the credible interval depend on log( 7 ); however, in the example, even the widest credible 
intervals are narrower than the “conservative” Neymanian confidence intervals. 


8. DISCUSSION 

In this paper, we have discussed causal inference of completely randomized treatment-control studies 
with binary outcomes under the potential outcomes model. We first made a connection between the 
Fisher randomization test (Fisher 1935a) and Fisher’s exact test (Fisher 1935b) for binary outcomes, 
and proposed a procedure which uniformly dominates Neyman (1923)’s method. Although widely 
used in clinical practice, statistical inference for general nonlinear causal measures are based on the 
assumption of independent Binomial models, which is not justified by randomization. Based on 
randomization, our asymptotic analysis shows that the widely used variance estimators are either 
incorrect or inefficient, unless the null hypothesis of zero average causal effect is true. 

Ding (2014) shows that the Neyman’s test for zero average causal effect tends to be more pow- 
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erful than Fisher’s test for zero individual causal effect for many realistic cases including balanced 
designs. Our variance estimator Q further improves Neyman’s test. Our new result is not con¬ 
tradictory to the classical result that Fisher’s exact test is the uniformly most powerful unbiased 
test for equal probability of two independent Binomials (Lehmann and Romano, 2006). Both the 
Fisherian and Neymanian approaches are derived under the potential outcomes model, but the 
classical result for Fisher’s exact test is derived under the independent Binomial models. 


Traditionally, the variance formulae in (21) and (25) for log(COR) have been used in both 
experimental and observational studies (including both prospective and retrospective observational 


studies). Due to the symmetry of the variance formulae in (21) and (25) with respect to the treat¬ 
ment and outcome, researchers found that statistical inference of the log odds ratio measure is 
invariant to the sampling scheme (experimental study, prospective or retrospective observational 
studies), which was regarded as a celebrated and also mysterious feature of the log of the odds 
ratio. As a pioneer in epidemiology and biostatistics. Cornfield (1959) said that “there is a dis¬ 
tinction seems undeniable, but its exact nature is elusive,” when he was discussing experimental 
and observational studies. However, randomized experiments are fundamentally different from ob¬ 
servational studies, and especially different from retrospective observational studies. Under the 
potential outcomes model with the potential outcomes treated as fixed quantities, the randomness 
of the observed outcomes comes only from the physical randomization in experiments. Therefore, 
the treatment and the observed outcome are asymmetric unless the sharp null is true, and the 
variance in (20) for log(COR) and its estimator in ( |A.4 ) reflect the asymmetric nature explicitly. 
In a recent comment on Cornfield (1959), Rubin (2012) suggested revealing the hidden nature of 
different studies using potential outcomes. Indeed, our results verify Rubin (2012)’s conjecture. 

In order to reveal the importance of the correlation between potential outcomes and the intrinsic 
lack of additivity for binary outcomes, we focus our discussion on two by two tables from completely 
randomized experiments. The same idea can be also applied to observational studies as long as 
the ignorability assumption holds. We can either stratify on the observed covariates or propensity 
scores (Rosenbaum and Rubin 1983), and then within each strata the data can be approximately 
viewed as generated from randomized experiments (Rosenbaum 2002). The findings of this paper 
can be generalized in many ways. For instance, we can discuss Neymanian randomization inference 
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for full factorial or fractional factorial designs with binary outcomes, since the current discussion 
(Dasgupta et al. 2015) is restricted to continuous outcomes. It will also be interesting to discuss 
causal inference under the potential outcomes model for general outcomes (categorical data, counts, 
survival times, etc.). These topics are our ongoing or future research projects. 


APPENDIX A. BIAS AND VARIANCE REDUCTION FOR NONLINEAR 

CAUSAL MEASURES 

Result for log(CRR). A bias-corrected estimator for log(CRR) is 

i„g(ciar = i„g(CRR) + ^»?-^»§. (A.1) 

with improved variance estimator 


^CRR = ^CRR - 


1^1(1 - 1^1) 

Pi%{N - 1)’ 


(A.2) 


where VcRR is defined in (19). 


Result for log(COR). A bias-corrected estimator for log(COR) is 


log(COR)'^ 


log(COR) + 


1 - 2pi Aq 2 
2p2(l-pi)2iViiV^l 


1 — 2po Al 2 
2p2(l-po)2iVoA"°’ 


with improved variance estimator 


(A.3) 


^COR — i^^OR - 


1^1(1 - 


pi{l -pi)po{l -Pq)IN - 1)’ 


where VcoR is defined in (21). 


(A.4) 


SUPPLEMENTARY MATERIALS 

In the Supplementary Materials, Appendix B contains two useful lemmas and their proofs, and 
Appendices C and D provide proofs of all the theorems and results in Appendix A above. Appendix 
E presents details about the simulation studies, especially the case with negatively associated 
potential outcomes and relatively small sample sizes. Appendix F gives more details about the 
example in Section 


23 










REFERENCES 


Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using 
instrumental variables (with discussion). Journal of the American Statistical Association, 
91, 444-455. 

Berkson, J. (1978). Do the marginal totals of the 2x2 table contain relevant information respecting 
the table proportions? (with discussion) Journal of Statistical Planning and Inference, 2, 43- 
44. 

Bissler, J. J. et al. (2013). Everolimus for angiomyolipoma associated with tuberous sclerosis com¬ 
plex or sporadic lymphangioleiomyomatosis (EXIST-2): a multicentre, randomised, double¬ 
blind, placebo-controlled trial. The Lancet, 381, 817-824. 

Chernoff, H. (2004). Information, for testing the equality of two probabilities, from the margins of 
the 2x2 table. Journal of Statistical Planning and Inference, 121, 209-214. 

Cochran, W. G. (1977). Sampling Techniques. John Wiley & Sons: New York. 

Copas, J. B. (1973). Randomization models for the matched and unmatched 2x2 tables. Biometrika, 
60, 467-476. 

Cornfield J. (1959). Principles of research. American Journal of Mental Deficiency, 64, 240-252. 

Cox, D. R. (1958). Planning of Experiments. Wiley: New York. 

Dasgupta, T., Pillai, N., and Rubin, D. B. (2015). Causal inference from 2^ factorial designs using 
the potential outcomes model. Journal of the Royal Statistical Society, Series B (Statistical 
Methodology), in press, available at http://arxiv.org/abs/1211.2481. 

Ding, P. (2014). A paradox from randomization-based causal inference. Available at http:// 
arxiv.org/abs/1402.0142, 

Fisher, R. A. (1935a). The Design of Experiments, 1st edn. Oliver &: Boyd: Oxford, England. 

Fisher, R. A. (1935b). The logic of inductive inference. Journal of the Royal Statistical Society, 98, 
39-82. 

Cadbury, G. L. (2001). Randomization inference and bias of standard errors. The American 
Statistician, 55, 310-313. 

Kempthorne, O. (1952) The Design and Analysis of Experiments. Wiley: New York. 

Lehmann, E. L. and Romano, J. P. (2006). Testing Statistical Hypotheses. Springer: New York. 


24 



Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on 
principles (with discussion). Section 9 (translated). Statistical Science, 5, 465-480. 

Robins, J. M. (1988). Confidence intervals for causal parameters. Statistics in Medicine, 7, 773-785. 

Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational 
studies for causal effects. Biometrika, 70, 41-55. 

Rosenbaum, P. R. (2002).Observational Studies. Springer: New York. 

Rothman, K. J., Greenland, S., and Lash, T. L. (2008). Modern Epidemiology. Lippincott Williams 
& Wilkins: Philadelphia, PA. 

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized 
studies. Journal of Educational Psychology, 66, 688-701. 

Rubin, D. B. (1977). Assignment to treatment group on the basis of a covariate. Journal of 
Educational Statistics, 2, 1-26. 

Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. The Annals 
of Statistics, 6, 34-58. 

Rubin, D. B. (1980). Comment on “Randomization analysis of experimental data: the Fisher 
randomization test” by D. Basu. Journal of the American Statistical Association, 75, 591- 
593. 

Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied 
statistician. The Annals of Statistics, 12, 1151-1172. 

Rubin, D. B. (1990). Neyman (1923) and causal inference in experiments and observational studies. 
Statistical Science, 5, 472-480. 

Rubin, D. B. (2005). Causal inference using potential outcomes: design, modeling, and decisions. 
Journal of the American Statistical Association, 100, 322-331. 

Rubin, D. B. (2010). Reflections stimulated by the comments of Shadish (2010) and West and 
Thoemmes (2010). Psychological Methods, 15, 38-40. 

Rubin, D. B. (2012). Potential updates to Cornfield’s 1959 “Principles of Research”. Statistics in 
Medicine, 31, 2778-2779. 

Woolf, B. (1955). On estimating the relation between blood group and disease. Annals of Human 
Genetics, 19, 251-253. 


25 



Yates, F. (1984). Tests of significance for 2x2 contingency tables (with discussion). Journal of the 


Royal Statistical Society, Series A, 147 , 426-463. 


Supplementary Materials for 

“A Potential Tale of Two by Two Tables from Completely 

Randomized Experiments” 

In the Supplementary Materials, Appendix B contains two useful lemmas and their proofs, and 
Appendix C and D provides proofs for all the theorems and the results in Appendix A above. 
Appendix E presents details about the simulation studies, especially the case with negatively asso¬ 
ciated potential outcomes. Appendix F gives more details about the example in Section 7. 


APPENDIX B. LEMMAS AND THEIR PROOFS 
Lemma A.l. The completely randomized treatment assignment W satisfies E(Wi} = Ni/N, 
var(ITi) = NiNo/N^, and cov(ITi, Wj) = -NiNo/{N^{N - 1)}. If (ci, ■■■ ,cn) and (di, • • • , djv) 
are constants with c = d/A^ and d = di/N, we have 

^ \ N N ^ 

^ ^ ^ ^ ^idij = ^(Cj - c){di - d). 

Proof of Lemma \A.1\ The observed outcomes in the treatment and control can be viewed as two 
sets of simple random samples from the finite population of {Yi(l) : i = 1, • • • , N} and {Yi(0) : i = 
1, • • • , N}, respectively. Therefore, the conclusion follows from classic survey sampling textbooks 
such as Cochran (1977). □ 


Lemma A. 2. The estimators, pi and po, are unbiased for pi and po, with variances and covariance: 
var(pi) = Yaiipo) = ^^Si, cov{pi,po) =-^Sio =-^{Sf + S^ - S^). 


Proof of Lemma A.J[ The unbiasedness and variances of pi and po follow directly from Lemma 


A.l The covariance between pi and po is 


COvfe.po) = 
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Summing from i = 1 to N over the following decomposition 


2{Yi{l) - pi}{Yi{0) - po} = {y.(l) - pi}^ + {^^(0) - po}^ - in - rf, 


we have 2S'io 




5^, and therefore the covariance can also be expressed as 

cov(pi,po) = + 5"^ - S^). 


□ 


APPENDIX C. PROOFS OF THE THEOREMS 

Proof of Theorem 1. Define the proportions pjk = Nj^/N. We first rewrite S^/N as 

1 


N 


N 

N{N -1) 

1 


n - T 


(A^io + ^01) — 


iV(iV- 1) 

+ 2poi - r^). 


(iVio-iVoi)- 

N 


In order to find the lower bound of S'^/N, we only need to find the lower bound for poi- This 

reduces to the following linear programming problem: 

/ 

min Poi 

Pll,Pl0,P01,P00 

S.t. Pll+PlO=Pl, 

' Pll+P01=P0, 

Pll +P10 +Poi +Poo = 1, 

> 0,i, j = 0,1. 

Since poi = Po — Pi + Pio > —t and poi > 0, the lower bound of poi is 


Poi > max(-T, 0), 


and therefore, the lower bound of S‘^/N is 

52 


> 


{r + 2 max(—r, 0) — r^} = ——-{max(—r, r) — r^} = 


N - N-I'' V > ^ iV-W ' ’ ^ iV-1 

From the derivation above, the bound is sharp, and is attained if and only if pio = 0 or poi = 0. 
Or, equivalently, attains its minimum at either of the two vertices within the feasible region 
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of the linear programming problem above: (pii,pio,Poi)Poo) = {pi,0, —t,1 — po) if t < 0, and 
{pn,Pio,Poi,Poo) = ipo,T,0, 1 -pi) if r > 0. □ 


Proof of Theorem 2. Define N'^ = + ay^ + j3y^ and = {riwi + a^)/N[jj as the sample sizes and 

proportions adjusted by the pseudo counts of the prior distributions. The posterior means of 7ri+ 
and TT+i are 

^(7ri+ I and .B(7r+i | 

The posterior variances of 7ri+ and vr+i are 

var(7ri+| = and var(7r+i | W, 

Immediately, we have 

£{i,+(i-ii+)i w.Y"»‘) = a(i -a) - -a), 

B{,+i{i-7r+,)Hv,r"»‘) = a(i - ft) - ~ y - - ft)- 

Applying the laws of conditional expectation and variance to (11) in the main text, we obtain the 
posterior mean 


E{t I W, 
E^E{t I 


■)} 


= E 


,7ri+,7r+ij 
nil + A'o7ri+ - npi - iVi7r+i 

N 


W,Y 


obs 


nil + - «oi - 


N 


N[ + No^ 


Nq + Ni^ ai — ao 

-Po - 


N 


N 


and posterior variance 


var(r | W, Y 


obs\ 


= E 


|var(r I VF,l"°'’^7rl+,7^+l)} +var{^;(T | 1^, 7ri+, vr+i)} 


= E { ^7ri+(l - 7ri+) + ^vr+i(l - vr+i) | 1^, 


No 


Ni 


+ var { ^vri+ - ^tt+i | W, Y 


obs 




N‘^{N[ + 1) 


ih(i-Fi) + 


N^iK + 1' 


m-Po) + 


Ni 


N^iN[ + l) 


Mi-Pi) + 


Nf 


iV2(iV' + 1) 


To(l-i^o) 


NoiN[ + No) ^i(l - Fi) ^ _iVi(iVi + IV') ^o(l - ^o) 




N[ + l 


iV2 


iV^ + 1 
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When we have large sample size, the prior pseudo counts are overwhelmed by the observed counts 
Ujk’s, and the posterior mean and variance of r can be approximately by 


Eir\ 


- r, 

var(r 


NoPi{l-pi) A^ipo(l-Po) 
N Ni-1 N No-1 


□ 


Proof of Theorem 3. Applying Taylor expansion, we have 


log(CRR) - log(CRR) = -{pi-pi) - — (po -po) + Op 

Pi Po 


Ari/2 


According to Lemma A.2 the asymptotic variance of log(CRR) is 


^var(pi) + ^var(po)-^cov(pi,po) 

PI Po PiPo 


pfNiN ^ p^qNoN 


PiPoN 


Nipi + NoPo ^2 , ^iPl + ^OPO e 2 

■'^1 2 AT AT *^0 “ 
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pfpoNiN 


PiP^NqN 


Nipi + Nqpo / ^ Sn 


+ 


PiPoN 

Sf 


Si 


piPoN \Nipi Nopo Nipi+Nopo 
Assume = 0, and we can estimate the asymptotic variance by 


VcRR = 


A^lPl + NqPo 2 , ^iPl + ^oPo 2 


Si + 


’0 


pIpoNiN pipINqN 

Nipi + NqPo ^ Nipi + NoPo No ^ ^ 

-2- at at 1\T - 3Pn^-PV + ^ -2 nr Ar ITt - ^Po{^ - Po) 


PlPoNiN A^i - 1 


PiP5A^oA^ A^o -1 


(A^iPi + A^oPo)(l - i>i) (A^iPi + A^oPo)(l - Po) 


PiPo{Ni -1)N 
nio {nii + noi)No 


+ 


PiPo{No - 1)N 

noo (nii+noi)A^i 


(A.l) 


nii(A^i-l) noiA^ noi(A^o - 1) nnN 

Ignoring the difference between and {N^ — 1) {w = 0,1) in asymptotic analysis, we obtain the 


formula in Theorem 3. 


□ 


Proof of Theorem 4- Applying Taylor expansion, we have 


log(COR) - log(COR) = 


1 


Pi(l -Pi) 


{Pi -Pi) - 
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Po(l - po) 


(PO - Po) + Op 
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AT/2 I ■ 
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According to Lemma A.2 the asymptotic variance of log(COR) is 
1 . 1 . 2 


p2(l -pi)2 

1 No 


vai{pi) + 




Poi'^-Po)^ 
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var(po) - 


pi(l -pi)po(l -Po) 

, 1 
° Pi(l-pi)po(l-Po)A^ 
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Po) 


-Po) 


The Neyman-type “conservative” variance estimator for the asymptotic variance is 
^COR ^ 


iViPi(l-pi) + Aopo(l- pq) ^2 , A^iPi(l-Pi) + A^oPo(l-Po) „2 

•^1 T ^ /I ^ \2 AT AT '^0 


pf(l -pi)2po(l -po)A^iVi 
fViPi(l - Pi) + iVoPo(l - Po) 
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pi(l - pi)Po(l - po)‘^NNo 
iViPi(l - pi) + A^oPo(l - Po) 
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_ iViPi(l - pi) + A^oPo(l - Po) / 1 1 A 

pi(l-pi)po(l-po) \NNi NNo) 

_ (npi + noo)niinio + (nn + nio)noinoo 
niinion-oinoo 
fill 

—-1-1-1-, 

nil R -10 R-oi ’^oo 

where the approximation is due to the difference between — 1 and for w = 0,1. 


(A.2) 

□ 


APPENDIX D. PROOFS FOR THE RESULTS IN APPENDIX A ABOUT BIAS 
AND VARIANCE REDUCTION FOR NONLINEAR CAUSAL 

MEASURES 

Proof of the Result for log(CRR). Applying Taylor expansion, we have 


log(CRR) - log(CRR) = —{pi -pi) - ^{pi -pi)^ - — (po -po) + 

Pi 2pf Po 2pg 


-Po)‘^ + Op{- 


Therefore, the asymptotic bias of log(CRR) is 


No 


-Tvarffi) + ivar(w) = -^^,^^A, 


5? + 


Ni 


2pt,NoN 


.q2 


05 


and the bias-corrected estimator for log(CRR) in Appendix A can be obtained by subtracting the 
estimated asymptotic bias from log(CRR). □ 
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Proof of the Result for log(COR). Applying Taylor expansion, we have 


log(COR) - log(COR) 

1 . l-2pi 


Pi(l -Pi) 

1 


(pi -Pi) - 


2pf(l -pi)2 


(pi -Pif 


-{po-po) + 


1 - 2po 


' 2pg(l-po)^ 

Therefore, the asymptotic bias of log(COR) is 


-PoY + Op{ ^ 


l-2pi 


^var(pi) + 


1 - 2po 


^var(po) = 


1 - 2pi No 


Sf + 


1 - 2po 


*^0) 


2pf(l-pi)2 2pl{l-poy 2pf(l-pi)2 iViiV 2pl{l - Po)^ NqN 

and the bias-corrected estimator for log(COR) in Appendix A can be obtained by subtracting the 
estimated asymptotic bias from log(COR). 


□ 


APPENDIX E. MORE SIMULATION STUDIES 

In order to compare the finite sample properties of Neyman’s original method, the modified Ney- 
man’s method, and the Bayesian method, we conduct the following set of simulation studies. In the 
main text, we choose the following two sets of potential outcomes: the first set of potential outcomes 
(All, Aio, Aoi, Aoo) are independent: (50, 50, 50, 50), (30, 70, 30, 70), (30, 90, 20, 60), (80, 20,80, 20), 
(60,20,90,30); the second set of potential outcomes are positively associated: (60,40,40,60), 
(50,50,30,70), (50,70,30,50), (40,110,10,40), (70,30,50,50), (50,30,70,50), (30,10,110,50). In 
addition, in this Supplementary Materials, we also choose negatively associated potential outcomes: 

(40, 60, 60,40), (30, 70, 50, 50), (40, 80,40,40), (30,120, 20, 30), (50, 50, 70, 30), (40,40, 80,40), (20, 20,120,40). 
We summarize the “Science” in Table A.7[ where Cases 1-5 represent independent potential out¬ 
comes, Cases 6-12 represent positively associated potential outcomes, and Cases 13-19 represent 
negatively associated potential outcomes. 

For given potential outcomes, we draw, repeatedly and independently, the treatment assign¬ 
ment vectors 5000 times, and apply the three methods after obtaining the observed outcomes. 

We compare three methods: Neymanian inference assuming constant treatment effects, improved 
Neymanian inference, and Bayesian inference assuming independent potential outcomes. 

The results are summarized in Figure A.4, A.5 and |A.6 with average biases, average lengths of 
the 95% confidence/credible intervals, and the coverage probabilities. The main text only reports 
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the results for CRD and log(COR), here we report the results for all causal measures. When the 
potential outcomes are independent or positively associated, the results for log(CRR) are similar 
to those for log(COR) as discussed in the main text. When the potential outcomes are negatively 
associated, all the interval estimates over cover the true causal measures, while the Bayesian credible 
intervals are the narrowest. 


APPENDIX F. MORE DETAILS ABOUT THE APPLICATION 

As in the main text, the example is taken from Bissler et al. (2013), and they compare the rate of 
adverse events in the treatment group versus the control group. The adverse event naspharyngitis 
occurred in 19 among 79 subjects in the treatment group with everolimus, and it occurred in 12 
among 39 subjects in the control group. Therefore, the 2x2 table representing the observed data 


has cell counts (nn, nio, noi, n-oo) = (19,60,12,27). Figure A.7 shows the sensitivity analysis for 
CRD,log(CRR) and log(COR), with similar patterns for all of them. 
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Bias 


Length 


Coverage 



(a) Independent Potential Outcomes; Cases 1 to 5 with the x-axis denoting the case numbers 


Bias Length Coverage 



6 7 8 9 10 11 12 6 7 8 9 10 11 12 6 7 8 9 10 11 12 



(b) Positively Associated Potential Outcomes: Cases 6 to 13 with the x-axis denoting the case number 

Figure 2: Simulation Studies. Each subfigure is a 2 x 3 matrix summarizing results for 2 parameters 
and 3 properties. Note that “Neyman” and “Bayes” are indistinguishable for biases of log(COR). 
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CRD 


log(CRR) 


log(COR) 


Neyman 

Neyman''c 

Bayes 



-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 


Neyman 

Neyman^c 

Bayes 



-0.8 -0.6 -0.4 -0.2 0 0.2 


Neyman 

Neyman''c 

Bayes 



(a) Inference for the Causal Measures. We apply Neymanian, improved Neymanian (Neyman^ above) and Bayesian 
approaches with the segments representing the 95% confidence/credible intervals and centers illustrating the point 
estimators. 


sensitivity anaiysis for CRD 



-2-101234 

log(y) 

sensitivity anaiysis for iog(COR) 



-2-101234 

log(y) 


(b) Bayesian Sensitivity Analysis for CRD and log(COR). The intervals named “independence” are the 95% posterior 
credible intervals under independence of the potential outcomes, and the intervals named “widest” are the widest 95% 
credible intervals over the ranges of the sensitivity parameters. 

Figure 3: A Randomized Experiments with Observed Data (nn,nio,noi,npo) = (19,60,12,27). 
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Table A.7 

; “Science table’ 

’ for the simulation studies 


Case 

iVii 

iVio 

iVoi 

A^oo 


C.2 

Sio 

S^r 

T 

log(CRR) 

log(COR) 

1 

50 

50 

50 

50 

0.251 

0.251 

0.000 

0.503 

0.000 

0.000 

0.000 

2 

30 

70 

30 

70 

0.251 

0.211 

0.000 

0.462 

0.200 

0.511 

0.847 

3 

30 

90 

20 

60 

0.241 

0.188 

0.000 

0.430 

0.350 

0.875 

1.504 

4 

80 

20 

80 

20 

0.251 

0.161 

0.000 

0.412 

-0.300 

-0.470 

-1.386 

5 

60 

20 

90 

30 

0.241 

0.188 

0.000 

0.430 

-0.350 

-0.629 

-1.504 

6 

60 

40 

40 

60 

0.251 

0.251 

0.050 

0.402 

0.000 

0.000 

0.000 

7 

50 

50 

30 

70 

0.251 

0.241 

0.050 

0.392 

0.100 

0.223 

0.405 

8 

50 

70 

30 

50 

0.241 

0.241 

0.010 

0.462 

0.200 

0.405 

0.811 

9 

40 

no 

10 

40 

0.188 

0.188 

0.013 

0.352 

0.500 

1.099 

2.197 

10 

70 

30 

50 

50 

0.251 

0.241 

0.050 

0.392 

-0.100 

-0.182 

-0.405 

11 

50 

30 

70 

50 

0.241 

0.241 

0.010 

0.462 

-0.200 

-0.405 

-0.811 

12 

30 

10 

no 

50 

0.161 

0.211 

0.010 

0.352 

-0.500 

-1.253 

-2.234 

13 

40 

60 

60 

40 

0.251 

0.251 

-0.050 

0.603 

0.000 

0.000 

0.000 

14 

30 

70 

50 

50 

0.251 

0.241 

-0.050 

0.593 

0.100 

0.223 

0.405 

15 

40 

80 

40 

40 

0.241 

0.241 

-0.040 

0.563 

0.200 

0.405 

0.811 

16 

30 

120 

20 

30 

0.188 

0.188 

-0.038 

0.452 

0.500 

1.099 

2.197 

17 

50 

50 

70 

30 

0.251 

0.241 

-0.050 

0.593 

-0.100 

-0.182 

-0.405 

18 

40 

40 

80 

40 

0.241 

0.241 

-0.040 

0.563 

-0.200 

-0.405 

-0.811 

19 

20 

20 

120 

40 

0.161 

0.211 

-0.040 

0.452 

-0.500 

-1.253 

-2.234 



1 


5 


Figure A.4: Simulation Results for Independent Potential Outcomes. Each subfigure is a 2 x 3 
matrix summarizing 3 repeated sampling properties (average bias, average length, and coverage of 
interval estimates) for 2 causal measures. Note that “Neyman” and “Bayes” are indistinguishable 
for biases of log(CRR) and log(COR). 
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Figure A.5: Simulation Results for Positively Associated Potential Outcomes. Each subfigure is a 
2x3 matrix summarizing 3 repeated sampling properties (average bias, average length, and coverage 
of interval estimates) for 2 causal measures. Note that “Neyman” and “Bayes” are indistinguishable 
for biases of log(CRR) and log(COR). 
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Figure A.6: Simulation Results for Negatively Associated Potential Outcomes. Each subfigure is a 
2x3 matrix summarizing 3 repeated sampling properties (average bias, average length, and coverage 
of interval estimates) for 2 causal measures. Note that “Neyman” and “Bayes” are indistinguishable 
for biases of log(CRR) and log(COR). 
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— log(COR) 

— Lower(Neyman‘^) 

— - Upper(Neyman‘^) 
• Mean(Bayes) 

■ Lower(Bayes) 

A Upper(Bayes) 


Figure A.7: Bayesian Sensitivity Analysis of the Trial with (nn, n-io, noi, noo) = (19,60,12,27). 
Three panels are for CRD, log(CRR), and log(COR), respectively. The intervals named “indepen¬ 
dence” are the 95% posterior credible intervals under independence of the potential outcomes, and 
the intervals named “widest” are the widest 95% credible intervals over the ranges of the sensitivity 


parameters. 
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